Hyper-rectangle-based Discriminative Data Generalization and Applications in Data Mining

نویسندگان

  • Byron Ju Gao
  • Anoop Sarkar
چکیده

The ultimate goal of data mining is to extract knowledge from massive data. Knowledge is ideally represented as human-comprehensible patterns from which end-users can gain intuitions and insights. Axis-parallel hyper-rectangles provide interpretable generalizations for multi-dimensional data points with numerical attributes. In this dissertation, we study the fundamental problem of rectangle-based discriminative data generalization in the context of several useful data mining applications: cluster description, rule learning, and Nearest Rectangle classification. Clustering is one of the most important data mining tasks. However, most clustering methods output sets of points as clusters and do not generalize them into interpretable patterns. We perform a systematic study of cluster description, where we propose novel description formats leading to enhanced expressive power and introduce novel description problems specifying different trade-offs between interpretability and accuracy. We also present efficient heuristic algorithms for the introduced problems in the proposed formats. If-then rules are known to be the most expressive and human-comprehensible representation of knowledge. Rectangles are essentially a special type of rules with all the attributional conditions specified whereas normal rules appear more compact. Decision rules can be used for both data classification and data description depending on whether the focus is on future data or existing data. For either scenario, smaller rule sets are desirable. We propose a novel rectangle-based and graph-based rule learning approach that finds rule sets with small cardinality. We also consider Nearest Rectangle learning to explore the data classification capacity of generalized rectangles. We show that by enforcing the so-called “right of inference”, Nearest Rectangle learning can potentially become an interpretable hybrid inductive learning method with competitive accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effective classification of 3D image data using partitioning methods

We propose partitioning-based methods to facilitate the classification of 3-D binary image data sets of regions of interest (ROIs) with highly non-uniform distributions. The first method is based on recursive dynamic partitioning of a 3-D volume into a number of 3-D hyper-rectangles. For each hyper-rectangle, we consider, as a potential attribute, the number of voxels (volume elements) that bel...

متن کامل

Nested Hyper-Rectangle Learning Model for Remote Sensing: Land Cover Classification

This study presents an exemplar-based nested hyperrectangle learning model (NHLM) which is an efficient and accurate supervised classification model. The proposed model is based on the concept of seeding training data in the Euclidean m-space (where m denotes the number of features) as hyper-rectangles. To express the exceptions, these hyper-rectangles may be nested inside one another to an arb...

متن کامل

A Systematic Review of Data Mining Applications in Digital Libraries

Purpose: Study aimed to identify the applications of data mining in the provision of services, collection and management of digital libraries. Methodology: This is an applied study in terms of purpose and in terms of method is qualitative research that have been done by systematic review method. For this purpose, articles have been obtained by searching databases of Springer, Emerald, ProQuest,...

متن کامل

Discriminative pattern mining and its applications in bioinformatics

Discriminative pattern mining is one of the most important techniques in data mining. This challenging task is concerned with finding a set of patterns that occur with disproportionate frequency in data sets with various class labels. Such patterns are of great value for group difference detection and classifier construction. Research on finding interesting discriminative patterns in class-labe...

متن کامل

Discriminative Feature Selection for Uncertain Graph Classification

Mining discriminative features for graph data has attracted much attention in recent years due to its important role in constructing graph classifiers, generating graph indices, etc. Most measurement of interestingness of discriminative subgraph features are defined on certain graphs, where the structure of graph objects are certain, and the binary edges within each graph represent the "presenc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007